Beyond Class A: A Proposal for Automatic Evaluation of Discourse
نویسندگان
چکیده
Introduct ion The DARPA Spoken Language communi ty has just completed the first trial evaluation of spontaneous query/response pairs in the Air Travel (ATIS) domain. 1 Our goal has been to find a methodology for evaluating correct responses to user queries. To this end, we agreed, for the first trial evaluation, to constrain the problem in several ways: D a t a b a s e A p p l i c a t i o n : Constrain the application to a database query application, to ease the burden of a) constructing the back-end, and b) determining correct responses; C a n o n i c a l A n s w e r : Constrain answer comparison to a minimal "canonical answer" that imposes the fewest constraints on the form of system response displayed to a user at each site; T y p e d I n p u t : Constrain the evaluation to typed input only; Class A: Constrain the test set to single unambiguous intelligible utterances taken without context that have well-defined database answers ("class A" sentences). These were reasonable constraints to impose on the first trial evaluation. However, it is clear that we need to loosen these constraints to obtain a more realistic evaluation of spoken language systems. The purpose of this paper is to suggest how we can move beyond evaluation of class A sentences to an evaluation of connected dialogue, including out-of-domain queries.
منابع مشابه
Necessities of Developing Diverse Cultural Potentials in Academic Discourse
The absolute hegemony of international code of (academic) communication has resulted in the development and spread of the discoursal voice of the culture form which historical English has emerged, and, as a consequence, any violation from the generic conventions and thinking patterns born out of such a discourse has resulted in the deprivation of non-native thinkers form active participation in...
متن کاملExamining Identity Options in Native and Nonnative Produced Textbooks Taught in Iran: A Critical Textbook Evaluation
Considering the crucial role textbook evaluation plays in any educational system, this study evaluated 2 textbook series with respect to the identity options they offer to Iranian learners of English. Data were gathered based on reading passages, dialogues, and pictures of Right Path to English (RPE) and Cambridge English for Schools (CES). Although this study is mainly qualitative in nature, q...
متن کاملPublic Spending on Health Service and Policy Research in Canada, the United Kingdom, and the United States: A Modest Proposal
Health services and policy research (HSPR) represent a multidisciplinary field which integrates knowledge from health economics, health policy, health technology assessment, epidemiology, political science among other fields, to evaluate decisions in health service delivery. Health service decisions are informed by evidence at the clinical, organizational, and policy level, levels with distinct...
متن کاملDevelopment and Usability Evaluation of an Online Tutorial for “How to Write a Proposal” for Medical Sciences Students
Background and Objective: Considering the importance of learning how to write a proposal for students, this study was performed to develop an online tutorial for “How to write a Proposal” for students and to evaluate its usability. Methods: This study is a developmental research and tool design. “Gamified Online Tutorial based on Self-Determination Theory (GOT-STD) Framework" became the basis f...
متن کاملContemporary methods for evaluating complex project proposals
The ability to evaluate project proposals, assessing future success, and organizational value is critical to overall business performance for most enterprises. Yet, predicting project success is difficult and often unreliable. A four-year field study shows that the effectiveness of available methods for evaluating and selecting large, complex project depends on the specific project type, org...
متن کامل